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Abstract — We describe a successive-cancellation list decoder for 
polar codes, which is a generalization of the classic successive- 
cancellation decoder of Arikan. In the proposed list decoder, up 
to L decoding paths are considered concurrently at each decoding 
stage. Then, a single codeword is selected from the list as output. 
If the most likely codeword is selected, simulation results show 
that the resulting performance is very close to that of a maximum- 
likelihood decoder, even for moderate values of L. Alternatively, 
if a "genie" is allowed to pick the codeword from the list, the 
results are comparable to the current state of the art LDPC 
codes. Luckily, implementing such a helpful genie is easy. 

Our list decoder doubles the number of decoding paths at 
each decoding step, and then uses a pruning procedure to 
discard all but the L "best" paths. Nevertheless, a straightforward 
implementation still requires Q(L ■ n 2 ) time, which is in stark 
contrast with the 0(n log n) complexity of the original successive- 
cancellation decoder. We utilize the structure of polar codes 
to overcome this problem. Specifically, we devise an efficient, 
numerically stable, implementation taking only 0(L ■ nlogn) 
time and 0(L ■ n) space. 



I. Introduction 

Polar codes, recently discovered by Arikan [1], are a major 
breakthrough in coding theory. They are the first and currently 
only family of codes known to have an explicit construction 
(no ensemble to pick from) and efficient encoding and decod- 
ing algorithms, while also being capacity achieving over binary 
input symmetric memoryless channels. Their probability of 
error is known to approach 0(2 _v/ ") [2|, with generalizations 
giving even better asymptotic results J3). 

Of course, "capacity achieving" is an asymptotic property, 
and the main sticking point of polar codes to date is that their 
performance at short to moderate block lengths is disappoint- 
ing. As we ponder why, we identify two possible culprits: 
either the codes themselves are inherently weak at these 
lengths, or the successive cancellation (SC) decoder employed 
to decode them is significantly degraded with respect to 
Maximum Likelihood (ML) decoding performance. More so, 
the two possible culprits are complementary, and so both may 
occur. 

In this paper we show an improvement to the SC decoder, 
namely, a successive cancellation list (SCL) decoder. Our list 
decoder has a corresponding list size L, and setting L = 1 
results in the classic SC decoder. It should be noted that the 
word "list" was chosen as part of the name of our decoder in 
order to highlight a key concept relating to the inner working 
of it. However, when our algorithm finishes, it returns a single 
codeword. 

The solid lines in Figure[T]corresponds to choosing the most 
likely codeword from the list as the decoder output. As can be 
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Fig. 1 . Word error rate of a length n = 2048 rate 1/2 polar code optimized 
for SNR=2 dB under various list sizes. Code construction was carried out via 
the method proposed in (4 |. The two dots represent upper and lower bounds 
(5] on the SNR needed to reach a word error rate of 10 — 5 . 
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Fig. 2. Comparison of our polar coding and decoding schemes to an 
implementation of the WiMax standard take from [:6|. All codes are rate 1/2. 
The length of the polar code is 2048 while the length of the WiMax code is 
2304. The list size used was L = 32. The CRC used was 16 bits long. 



seen, this choice of the most likely codeword results in a large 
range in which our algorithm has performance very close to 
that of the ML decoder, even for moderate values of L. Thus, 
the sub-optimality of the SC decoder indeed does plays a role 
in the disappointing performance of polar codes. 

Even with the above improvement, the performance of 
polar-codes falls short. Thus, we conclude that polar-codes 
themselves are weak. Luckily, we can do better. Suppose that 
instead of picking the most likely codeword from the list, a 
"genie" would aid us by telling us what codeword in the list 
was the transmitted codeword (if the transmitted codeword was 
indeed present in the list). Luckily, implementing such a genie 
turns out to be simple, and entails a slight modification of the 
polar code. With this modification, the performance of polar 
codes is comparable to state of the art LDPC codes, as can be 
seen in Figure [2] 

In fairness, we refer to Figure [3] and note that there are 
LDPC codes of length 2048 and rate 1/2 with better per- 
formance than our polar codes. However, to the best of our 



2 



Normalized rates of code families over BIAWGN, Pe=0.0001 




" Turbo R = 1/3 

• Turbo R 1/6 
- Turbo R 1'4 

Voyager 
Galileo HGA 
£> Turbo R=1/2 
° Oassir Pal'i'inoer 
<■ Galileo LGA 

o Hermrtiao curve [64.32] (SDD) 

• BCH (Koetier-Varoy} 

-e- Polar+CRC R I 2 (List den ) 
— o— ME LDPC R-l/2 (BP) 



Fig. 3. Comparison of normalized rate Q for a wide class of codes. The 
target word error rate is 10~ 4 . The plot is courtesy of Dr. Yury Polyanskiy. 

knowledge, for length 1024 and rate 1/2 it seems that our 
implementation is slightly better than previously known codes 
when considering a target error-probability of 1CP 4 . 

The structure of this paper is as follows. In Section |IlJ we 
present Arikan's SC decoder in a notation that will be useful to 
us later on. In Section [III] we show how the space complexity 
of the SC decoder can be brought down from 0(n\ogn) to 
0{n). This observation will later help us in Section IV where 
we presents our successive cancellation list decoder with time 
complexity O(L-nlogn). Section[v]introduces a modification 
of polar codes which, when decoded with the SCL decoder, 
results in a significant improvement in terms of error rate. 

This paper contains a fair amount of algorithmic detail. 
Thus, on a first read, we advise the reader to skip to Section IV 
and read the first three paragraphs. Doing so will give a high- 
level understanding of the decoding method proposed and also 
show why a naive implementation is too costly. Then, we 
advise the reader to skim Section [V] where the "list picking 
genie" is explained. 

II. Formalization of the Successive Cancellation 
Decoder 

The Successive Cancellation (SC) decoder is due to Arikan 
(T). In this section, we recast it using our notation, for future 
reference. 

Let the polar code under consideration have length n = 2 m 
and dimension k. Thus, the number of frozen bits is n — k. 
We denote by u — (Uj)£T = Ug _1 the information bits vector 
(including the frozen bits), and by c = Cq _1 the corresponding 
codeword, which is sent over a binary-input channel W : X — » 
y, where X = {0, 1}. At the other end of the channel, we 
get the received word y = yg - ■ A decoding algorithm is 
then applied to y, resulting in a decoded codeword c having 
corresponding information bits u. 

A. An outline of Successive Cancellation 

A high-level description of the SC decoding algorithm 
is given in Algorithm [T] In words, at each phase ip of 



the algorithm, we must first calculate the pair of probabil- 

ities Wi^Cy^.fio" 1 ^) and Wi^- 1 ,^" 1 !!), defined 
shortly. Then, we must make a decision as to the value of u v 
according to the pair of probabilities. 



Algorithm 1: A high-level description of the SC decoder 



Input: the received vector y 
Output: a decoded codeword c 

i for ip = 0, 1, . . . , n — 1 do 



calculate W^\^~\ "o _1 |0) and (y£-\ 
if u v is frozen then 

| set u v to the frozen value of u v 
else 

ifW^(yr\^ 1 \0)>wL^(y^ .u„ 
| set dtp <s— 
else 



r{ip) '" n - 1 ,ur 1 \l) then 



|^set u^, <— 1 



10 return the codeword c corresponding to u 

We now show how the above probabilities are calculated. 
For layer < A < to, denote hereafter 



Recall [ 1 ] that for 



bit channel is a binary input channel with output 

alphabet y A x X'^, the conditional probability of which we 
generic ally denote as 
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< <p < A , 



(1) 



(2) 



(3) 



In our context, yo 1 is always a contiguous subvector of 
received vector y. Next, for 1 < A < to, recall the recursive 
definition of a bit channel |Q] Equations (22) and (23)] : let 
< 2tp < A, then 

branch [3 
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^ri(yA / -2 i ,<ddl^ + i) (5) 



branch 2/3 + 1 



with "stopping condition" W (y\u) = W(y\u 
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B. Detailed description 

For Algorithm [T] to become concrete, we must specify how 
the probability pair associated with W^t 1 is calculated, and 
how the set values of u, namely Uq ~ , are propagated into 
those calculations. We now show an implementation that is 
straightforward, yet somewhat wasteful in terms of space. 

For A > and < tp < A, recall the recursive definition of 
(yo 1 , u o 1 \ u <p) gi ven m either Q or (jij, depending on 
the parity of tp. For either tp = 2ip or tp = 2tp + l, the channel 
W^_\ is evaluated with output (y£ /2_1 , u^veL © u ol.dd)> 
as well as with output (y^ 1 , Uq ^d). Since our algorithm 
will make use of these recursions, we need a simple way 
of defining which output we are referring to. We do this by 
specifying, apart from the layer A and the phase tp which define 
the channel, the branch number 



< (3 < 2' 



(6) 



Since, during the run of the SC algorithm, the channel Wm 
is only evaluated with a single output, (y^ -1 , Uq 1 ), we give a 
branch number of j3 = to each such output. Next, we proceed 
recursively as follows. For A > 0, consider a channel W[ 
with output (yq^jUq -1 ) and corresponding branch number 
/3. Denote ip = [p/2\. The output (y„ /2 ~\ u^"^ © u^dd) 
associated with W'j^} 1 will have a branch number of 2/3, while 

the output (y^j2> Uq odd) nave a t> rancn number of 2/3 + 
1. Finally, we mention that for the sake of brevity, we will 
talk about the output corresponding to branch /3 of a channel, 
although this is slightly inaccurate. 

We now introduce our first data structure. For each layer 
< A < to, we will have a probabilities array, denoted by 
P\, indexed by an integer < i < 2 m and a bit b e {0, 1}. 
For a given layer A, an index i will correspond to a phase 
< tp < A and branch < /3 < 2" l ~ A using the following 
quotient/reminder representation. 



t = (p,f3) x = p + 2 x -f3. 



(7) 



In order to avoid repetition, we use the following shorthand 

P x [(tp,f3)] = P x [{tp,f3) x ]. (8) 

The probabilities array data structure Pa will be used as 
follows. Let a layer < A < to, phase < ip < A, and branch 
< /3 < 2 m ~ A be given. Denote the output corresponding to 
branch (3 of as (y^ 1 , u£ _1 ). Then, ultimately, we will 
have for both values of b that 



P x [(p,f3)][b]=W^(yt\<~ 1 \b) 



(9) 



Analogously to defining the output corresponding to a 
branch (3, we would now like define the input corresponding 
to a branch. As in the "output" case, we start at layer to 
and continue recursively. Consider the channel Wm, and let 
be the corresponding input which Algorithm [T] assumes. 
We let this input have a branch number of (3 = 0. Next, we 
proceed recursively as follows. For layer A > 0, consider the 



channels wf'^ and W^' 1 '^ 1 ' having the same branch j3 with 
corresponding inputs ui^ and U2^+i, respectively. In light of 
(JiJ, we now consider W x \ and define the input corresponding 



to branch 2/3 as ® U2^+i- Likewise, we define the input 
corresponding to branch 2/3 + 1 as U2^+i- Note that under 
this recursive definition, we have that for all < A < m, 
< <p < A, and < f3 < 2 m ~ A , the input corresponding to 
branch /3 of is well defined. 

The following lemma points at the natural meaning that 
a branch number has at layer A = 0. It is proved using a 
straightforward induction. 

Lemma 1: Let y and c be as in Algorithm [T] the received 
vector and the decoded codeword. Consider layer A = 0, and 
thus set p = 0. Next, fix a branch number < (3 < 2 n . Then, 
the input and output corresponding to branch (3 of are 
yp and ip, respectively. 

We now introduce our second, and last, data structure for 
this section. For each layer < A < to, we will have a bit 
array, denoted by B x , and indexed by an integer < i < 2™\ 
as in (j7]). The data structure will be used as follows. Let layer 
< A < to, phase < ip < A, and branch < /3 < 2 m ~ A be 
given. Denote the input corresponding to branch (3 of W x 
as u(X,p,/3). Then, ultimately, 



(v) 



B x [(<p,fl] = &(\,<p,P), (10) 

where we have used the same shorthand as in (j8). Notice that 
the total memory consumed by our algorithm is 0(77 log 77). 

Our first implementation of the SC decoder is given as 
Algorithms [2}|4] The main loop is given in Algorithm [2] 
and follows the high-level description given in Algorithm uj 
Note that the elements of the probabilities arrays P x and bit 
array B\ start-out uninitialized, and become initialized as the 
algorithm runs its course. The code to initialize the array 
values is given in Algorithms [3] and |4] 

Algorithm 2: First implementation of SC decoder 

Input: the received vector y 
Output: a decoded codeword c 

1 for f3 — 0, 1, . . . , n — 1 do // Initialization 

2 L Po[{0,P)][0] <- W(yp\0), Po[(0,/3)][l] «- W(y P \l) 

3 for ip = 0, 1, . . . , n — 1 do // Main loop 
4 
5 

6 
7 



9 
10 
11 

12 
13 



recursivelyCalcP(m, tp) 
if u v is frozen then 

I set B m [{tp, 0}] to the frozen value of u ¥ 
else 

if P m [{tp, 0)][0] > Pm[{tp, 0)][1] then 

I set Bm[{tp,0)] <- 
else 

L set B m [((p,0)] <- 1 

if tp mod 2 = 1 then 

|_ recursivelyUpdateB(m, <p) 



14 return the decoded codeword: c 



Lemma 2: Algorithms [2}j4] are a valid implementation of 
the SC decoder. 

Proof: We first note that in addition to proving the claim 
explicitly stated in the lemma, we must also prove an implicit 
claim. Namely, we must prove that the actions taken by the 
algorithm are well defined. Specifically, we must prove that 
when an array element is read from, it was already written to 
(it is initialized). 
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Algorithm 3: recursivelyCalcP(A, tp) implementation I 
Input: layer A and phase tp 

1 if A = then return // Stopping condition 

2 set tp <— [p/2\ 

II Recurse first, if needed 

3 if ip mod 2 = then recursivelyCalcP(A — 1, tp) 

4 for = 0, l,...,2 m " A - 1 do // calculation 
if tp mod 2 = then // apply Equation (fij) 

for u' G {0, 1} do 

Px[(tp,0)][u'] <~ E„« \Px-i[(ip,20)\[u'®u"\ ■ 

i^_i[^,2|8 + l>][u"] 

else / / apply Equation (15]) 

set w' <- B A [(^-l,j9)] 
for u" G {0, 1} do 

Pa[(v,/3)]K'] <- |^-i[<V,2/3>][«'©u"]- 

Pa-i [<</>, 2/3 + !>][«"] 



5 
6 
7 
8 

9 
10 
11 
12 
13 



Algorithm 4: recursivelyUpdateB(A, y>) implementation I 
Require : tp is odd 

1 set <— Lv/2J 

2 for = 0,1,..., 2 m " ' 

3 Ba-iKV», 2/3)1 «- BaK<P - 1, 0)} © Pa[<<?, /?)] 



2 m ~ A - 1 do 
B X -x\{i>,W)\^Bx[{v-l,P)] 
Bx-x[(tp,20 + \)]^B x [(tp,0)] 

s if -0 mod 2 = 1 then 

6 |_ recurs ivelyUpdateB(A — 1, tp) 



Both the implicit and explicit claims are easily derived from 
the following observation. For a given < tp < n, consider 
iteration tp of the main loop in Algorithm [2] Fix a layer < 
A < to, and a branch < j3 < 2 m ~ A . If we suspend the run 
of the algorithm just after the iteration ends, then (|9]i holds 
with p' instead of tp, for all 



< tp' < 



9 



Similarly, (10 1 holds with <p' instead of tp, for all 

' tp + l 



< tp' < 



~)m—\ 



The above observation is proved by induction on tp. ■ 

III. Space-Efficient Successive Cancellation 
Decoding 

The running time of the SC decoder is 0(n log n), and our 
implementation is no exception. As we have previously noted, 
the space complexity of our algorithm is O(nlogn) as well. 
However, we will now show how to bring the space complexity 
down to 0(n). The observation that one can reduce the space 
complexity to 0(n) was noted, in the context of VLSI design, 
in EQ. 

As a first step towards this end, consider the probability 
pair array P m . By examining the main loop in Algorithm [2] 
we quickly see that if we are currently at phase tp, then we 
will never again make use of P m [(tp',0)] for all ip 1 < p. On 
the other hand, we see that P m [(ip",0)} is uninitialized for all 
tp" > p. Thus, instead of reading and writing to P m [(tp, 0)], 
we can essentially disregard the phase information, and use 
only the first element P m [0] of the array, discarding all the 
rest. By the recursive nature of polar codes, this observation 



— disregarding the phase information — can be exploited for 
a general layer A as well. Specifically, for all < A < to, 
let us now define the number of elements in P\ to be 2" l ~ A . 
Accordingly, 



Px[(tp,/3)] is replaced by P x [/3] 



(11) 



Note that the total space needed to hold the P arrays has 
gone down from 0(n log n) to 0(n). We would now like to do 
the same for the B arrays. However, as things are currently 
stated, we can not disregard the phase, as can be seen for 
example in line 3 of Algorithm [4] The solution is a simple 
renaming. As a first step, let us define for each < A < to an 
array C x consisting of bit pairs and having length n/2. Next, 
let a generic reference of the form B x [{p,P)\ be replaced by 
C x [ip + P- 2 A_1 ] [tp mod 2], where ip = [tp/2\ . Note that we 
have done nothing more than rename the elements of P>x as 
elements of Cx- However, we now see that as before we can 
disregard the value of tp and take note only of the parity of tp. 
So, let us make one more substitution: replace every instance 
of C x [tp+P-2 x - l ][tp mod 2] by C x [J3][tp mod 2], and resize 
each array C x to have 2 m ~ A bit pairs. To sum up, 



B x [(tp, (3)} is replaced by C A [/3] [p mod 2] 



(12) 



The alert reader will notice that a further reduction in space 
is possible: for A = we will always have that tp = 0, and 
thus the parity of tp is always even. However, this reduction 
does not affect the asymptotic space complexity which is 
now indeed down to 0(n). The revised algorithm is given 
as Algorithms BHT] 



Algorithm 5: Space efficient SC decoder, main loop 

Input: the received vector y 
Output: a decoded codeword c 

1 for = 0, 1, . . . ,n — 1 do // Initialization 

2 |_ set P O [0][O] <- W(yp\0), Po[0][l] «- W(y p \l) 

3 for tp = 0, 1, . . . , n — 1 do // Main loop 

4 
5 
6 
7 



9 
10 
11 

12 
13 



recursivelyCalcP(m, tp) 
if u v is frozen then 

| set C7 m [0][^s mod 2] to the frozen value of u ¥ 
else 

if P m [0][0] > P m [0][l] then 

j set C m [0] [tp mod 2] <- 
else 

|_ set C m [0][tp mod 2] <- 1 

if tp mod 2 = 1 then 

| recursivelyUpdateC(m, tp) 



14 return the decoded codeword: c = (Co[/3][0]) 



n-l 

/3=0 



We end this subsection by mentioning that although we were 
concerned here with reducing the space complexity of our SC 
decoder, the observations made with this goal in mind will 
be of great use in analyzing the time complexity of our list 
decoder. 

IV. Successive Cancellation List Decoder 

In this section we introduce and define our algorithm, the 
successive cancellation list (SCL) decoder. Our list decoder 
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Algorithm 6: recursivelyCalcP(A, <p) 



space-efficient 



Input: layer A and phase ip 

1 if A = then return // Stopping condition 

2 set ip <— [ip/2\ 

II Recurse first, if needed 

3 if ip mod 2 = then recursivelyCalcP(A — 1, ip) 

II Perform the calculation 



4 for/? = 0,1, 



1 do 



9 
10 
11 



/ / apply Equation Q 



if ip mod 2 = then 
for u G {0, 1} do 

E„» |Pa-i[2/3][u' u") ■ P x - 1 [2/3 + l][u"] 
else / / apply Equation 

set u'«-<7a|)0][O] 
for u" G {0, 1} do 



(5) 



L -Pa[/3]K'] <- iP A _i [2/8] [«'©«"] -Pa-i [2j9+1] [u" 



Algorithm 7: recursivelyUpdateC(A, </?) space-efficient 

Input: layer A and phase ip 
Require : <p> is odd 

1 set ip <— L^/SJ 

2 for /3 = 0, l,...,2 m - A - 1 do 

3 
4 



C X -i[20\[i> mod 2] «- C A [/3][0] ffi C x [/3][1] 
C A -i[2/3 + l][V mod 2] <- Ca[0][1] 

5 if t/ 1 mod 2 = 1 then 

6 |_ recursivelyUpdateC(A — 1, ip) 



has a parameter L, called the list size. Generally speaking, 
larger values of L mean lower error rates but longer running 
times. We note at this point that successive cancellation list 
decoding is not a new idea: it was applied in to Reed- 
Muller codes3 

Recall the main loop of an SC decoder, where at each phase 
we must decide on the value of L, In an SCL decoder, instead 
of deciding to set the value of an unfrozen u v to either a 
or a 1, we inspect both options. Namely, when decoding a 
non-frozen bit, we split the decoding path into two paths (see 
Figure [4]). Since each split doubles the number of paths to be 
examined, we must prune them, and the maximum number of 
paths allowed is the specified list size, L. Naturally, we would 
like to keep the "best" paths at each stage, and thus require 
a pruning criterion. Our pruning criterion will be to keep the 
most likely paths. 

'in a somewhat different version of successive cancellation than that of 
Arikan's, at least in exposition. 




Fig. 4. Decoding paths of unfrozen bits for L = 4: each level has at most 
4 nodes with paths that continue downward. Discontinued paths are colored 
gray. 



Consider the following outline for a naive implementation 
of an SCL decoder. Each time a decoding path is split into 
two forks, the data structures used by the "parent" path are 
duplicated, with one copy given to the first fork and the other 
to the second. Since the number of splits is Vl(L ■ n), and 
since the size of the data structures used by each path is 
O(n), the copying operation alone would take time Vl(L ■ n 2 ). 
This running time is clearly impractical for all but the short- 
est of codes. However, all known (to us) implementations 
of successive cancellation list decoding have complexity at 
least £l(L ■ n 2 ). Our main contribution in this section is the 
following: we show how to implement SCL decoding with 
time complexity 0(L ■ nlogn) instead of Vt(L ■ n 2 ). 

The key observation is as follows. Consider the P arrays of 
the last section, and recall that the size of P\ is proportional 
to 2 m ~ A . Thus, the cost of copying P\ grows exponentially 
small with A. On the other hand, looking at the main loop of 
Algorithm [5] and unwinding the recursion, we see that P\ is 
accessed only every 2 m ~ A incrementations of (p. Put another 
way, the bigger P\ is, the less frequently it is accessed. The 
same observation applies to the C arrays. This observation 
suggest the use of a "lazy-copy". Namely, at each given stage, 
the same array may be flagged as belonging to more than one 
decoding path. However, when a given decoding path needs 
access to an array it is sharing with another path, a copy is 
made. 



A. Low-level functions 

We now discuss the low-level functions and data structures 
by which the "lazy-copy" methodology is realized. We note 
in advance that since our aim was to keep the exposition as 
simple as possible, we have avoided some obvious optimiza- 
tions. The following data structures are defined and initialized 
in Algorithm [8] 

Algorithm 8: initializeDataStmctures() 

1 inactivePathlndices new stack with capacity L 

2 active Path «— new boolean array of size L 

3 arrayPointer_P ^— new 2-D array of size (m + 1) x L, the 
elements of which are array pointers 

4 arrayPointer_C ^— new 2-D array of size (m + 1) x L, the 
elements of which are array pointers 

5 pathlndexToArraylndex «- new 2-D array of size (m + 1) x L 

6 inactiveArraylndices new array of size m + 1, the elements 
of which are stacks with capacity L 

7 arrayReferenceCount <— new 2-D array of size (m + 1) x L 
II Initialization of data structures 

8 for A = 0, 1, . . . , m do 

9 for s = 0, 1, . . . , L — 1 do 

to arrayPointer_P[A][s] new array of float pairs of 

size 2 m - x 

arrayPointer_C[A][s] <s— new array of bit pairs of size 

2 m — X 

12 arrayReferenceCount[A][s] «- 

13 push(inactiveArraylndices[A], s) 



14 for 1 = 0,1, 



, L 1 do 



is 

16 



active Path [£] <- false 
push(inactivePathlndices, £) 



6 



Each path will have an index I, where < £ < L. At 
first, only one path will be active. As the algorithm runs 
its course, paths will change states between "active" and 
"inactive". The inactivePathlndices stack [TO] Section 10.1] 
will hold the indices of the inactive paths. We assume the 
"array" implementation of a stack, in which both "push" and 
"pop" operations take 0(1) time and a stack of capacity L 
takes 0(L) space. The activePath array is a boolean array 
such that activePath [£] is true iff path I is active. Note that, 
essentially, both inactivePathlndices and activePath store 
the same information. The utility of this redundancy will be 
made clear shortly. 

For every layer A, we will have a "bank" of L probability- 
pair arrays for use by the active paths. At any given moment, 
some of these arrays might be used by several paths, while 
others might not be used by any path. Each such array is 
pointed to by an element of arrayPointer_P. Likewise, we 
will have a bank of bit-pair arrays, pointed to by elements of 
arrayPointer_C. 

The path I ndexTo Array Index array is used as follows. For 
a given layer A and path index £, the probability-pair array and 
bit-pair array corresponding to layer A of path £ are pointed 
to by 



Algorithm 9: assignlnitialPathQ 



arrayPointer_P[A] [pathlndexToArraylndex[A] [£]' 



and 



array Pointer_C [A] [path I ndexToArrayl ndex [A] [£}} , 
respectively. 

Recall that at any given moment, some probability-pair 
and bit-pair arrays from our bank might be used by multiple 
paths, while others may not be used by any. The value 
of arrayReferenceCount[A][s] denotes the number of paths 
currently using the array pointed to by arrayPointer_P[A] [s]. 
Note that this is also the number of paths making use of 
arrayPointer_C[A][s]. The index s is contained in the stack 
inactiveArraylndices[A] iff arrayReferenceCount[A][s] is 
zero. 

Now that we have discussed how the data structures are 
initialized, we continue and discuss the low-level functions 
by which paths are made active and inactive. We start by 
mentioning Algorithm [9] by which the initial path of the 
algorithm is assigned and allocated. In words, we choose 
a path index £ that is not currently in use (none of them 
are), and mark it as used. Then, for each layer A, we mark 
(through path I ndexToArrayl ndex) an index s such that both 
arrayPointer_P[A][s] and arrayPointer_C[A][s] are allocated 
to the current path. 

Algorithm 10 is used to clone a path — the final step before 
splitting that path in two. The logic is very similar to that of 
Algorithm |9j but now we make the two paths share bit-arrays 
and probability arrays. 



Algorithm 1 1 is used to terminate a path, which is achieved 
by marking it as inactive. After this is done, the arrays marked 
as associated with the path must be dealt with as follows. Since 
the path is inactive, we think of it as not having any associated 
arrays, and thus all the arrays that were previously associated 



Output: index £ of initial path 

1 £ <- pop(inactivePathlndices) 

2 activePath [£] true 

// Associate arrays with path index 

3 for A = 0, 1, . . . , m do 

4 s <- pop(inactiveArraylndices[A]) 
s pathlndexToArraylndex[A][£] s 
6 arrayReferenceCount[A][s] «- 1 



7 return £ 



Algorithm 10: clonePath(f) 



Input: index £ of path to clone 
Output: index £' of copy 

1 £' <r- pop(inactivePathlndices) 

2 activePath^'] <- true 

/ / Make t reference same arrays as £ 

3 for A = 0, 1, ... ,m do 
s <s— pathlndexToArraylndex[A][^] 
pathlndexToArraylndex[A][/] <— s 
arrayReferenceCount[A] [s]++ 

7 return £' 

with the path must have their reference count decreased by 
one. 



Algorithm 11: killPath( 



Input: index £ of path to kill 

// Mark the path index £ as inactive 

1 activePath^] «- false 

2 push(inactivePath Indices,^) 

/ / Disassociate arrays with path index 

3 for A = 0, 1, ... ,77i do 

4 

5 
6 
7 



s «- pathlndexToArraylndex[A][^] 
arrayReferenceCount[A] [s] — 
if arrayReferenceCount[A][s] = then 
j push(inactiveArraylndices[A], s) 



The goal of all previously discussed low-level functions was 
essentially to enable the abstraction implemented by the func- 
tions getArrayPointer_P and getArrayPointer_C. 
The function getArrayPointer_P is called each time 
a higher-level function needs to access (either for read- 
ing or writing) the probability-pair array associated with 
a certain path £ and layer A. The implementation of 
getArrayPointer_P is give in Algorithm 12 There are 



two cases to consider: either the array is associated with more 
than one path or it is not. If it is not, then nothing needs to 
be done, and we return a pointer to the array. On the other 
hand, if the array is shared, we make a private copy for path 
£, and return a pointer to that copy. By doing so, we ensure 
that two paths will never write to the same array. The function 
getArrayPointer_C is used in the same manner for bit- 
pair arrays, and has exactly the same implementation, up to 
the obvious changes. 

At this point, we remind the reader that we are deliberately 
sacrificing speed for simplicity. Namely, each such function 
is called either before reading or writing to an array, but the 
copy operation is really needed only before writing. 

We have now finished defining almost all of our low-level 



7 



Algorithm 12: getArrayPointer_P(A, £) 
Input: layer A and path index £ 

Output: pointer to corresponding probability pair array 

// getArrayPointer_C(A, £) is defined 

identically, u p to the obvious changes 
\6\ anc 
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in lines and 

1 s <- pathlndexToArraylndex[A][£] 

2 if arrayReferenceCount[A][s] = 1 then 

3 | s' S 

4 else 

s s' <- pop(inactiveArraylndices[A]) 
6 copy the contents of the array pointed to by 
arrayPointer_P[A][s] into that pointed to by 
arrayPointer_P[A][s'] 
arrayReferenceCount[A] [s] — 
arrayReferenceCount[A][s'] <- 1 
pathlndexToArraylndex[A][f] <s— s' 

to return arrayPointer_P[A][s'] 

functions. At this point, we should specify the constraints one 
should follow when using them and what one can expect if 
these constraints are met. We start with the former. 

Definition 1 (Valid calling sequence): Consider a sequence 
(Jt)tlo of T + 1 calls to the low-level functions implemented 
in Algorithms [& 12 We say that the sequence is valid if the 
following traits hold. 

Initialized: The one and only index t for which f t is equal 
to initializeDataStructures is t = 0. The one and 
only index t for which f t is equal to assignlnitialPath 
is t = 1. 

Balanced: For 1 < t < T, denote the number of times the 
function clonePath was called up to and including stage t 
as 



TTclonePath 
/(*) 



{1 < i < t : fi is clonePath} | 



Define #^ illPath similarly. Then, for every 1 < t < L, we 
require that 



1 < 1 



n(t) 

"clonePath 



# 



(t) 

killPath 



< L 



(13) 



Active: We say that path £ is active at the end of stage 
1 < t < T if the following two conditions hold. First, there 
exists an index 1 < i < t for which fi is either clonePath 
with corresponding output £ or assignlnitialPath with 
output £. Second, there is no intermediate index i < j <t for 
which fj is killPath with input £. For each 1 < t < T we 
require that if f t +i has input £, then £ is active at the end of 
stage t. 

We start by stating that the most basic thing one would 
expect to hold does indeed hold. 

Lemma 3: Let (ft)t = Q be a valid sequence of calls to the 
low-level functions implemented in Algorithms [8 -12 Then, 
the run is well defined: i) A "pop" operation is never carried 
out on a empty stack, ii) a "push" operation never results in a 
stack with more than L elements, and iii) a "read" operation 
from any array defined in lines [2}j7] of Algorithm [8] is always 
preceded by a "write" operation to the same location in the 
array. 

Proof: The proof boils-down to proving the following 
four statements concurrently for the end of each step 1 < t < 
T, by induction on t. 



A path index £ is active by Definition [T] iff 
activePath[^] is true iff inactivePathlndices does 
not contain the index £. 



II The bracketed expression in (13i is the number of 
active paths at the end of stage t. 

III The value of arrayReferenceCount[A][s] is positive 
iff the stack inactiveArraylndices[A] does not con- 
tain the index s, and is zero otherwise. 

IV The value of arrayReferenceCount[A][s] is equal 
to the number of active paths £ for which 
pathlndexToArraylndex[A][^] = s. 

m 

We are now close to formalizing the utility of our low- 
level functions. But first, we must formalize the concept of a 
descendant path. Let (ft)f = o be a valid sequence of calls. Next, 
let £ be an active path index at the end of stage 1 < t < T. 
Henceforth, let us abbreviate the "phrase path index £ at the 
end of stage t" by "[£, t]". We say that [£', t+ 1] is a child of 
[£, t] if i) £' is active at the end of stage t + 1, and ii) either 
£' — £ or f t+1 was the clonePath operation with input £ 
and output £' . Likewise, we say that [£' , t'] is a descendant of 
[£, t] if 1 < t < t' and there is a (possibly empty) hereditary 
chain. 

We now broaden our definition of a valid function calling 
sequence by allowing reads and writes to arrays. 

Fresh pointer: consider the case where t > 1 and f t is ei- 
ther the getArrayPointer_P or getArrayPointer_C 
function with input (A, £) and output p. Then, for valid indices 
i, we allow read and write operations to p[i] after stage t 
but only before any stage t' > t for which f t > is either 
clonePath or killPath. 

Informally, the following lemma states that each path effec- 
tively sees a private set of arrays. 

Lemma 4: Let (/t)?Lo t> e a valid sequence of calls to the 
low -level functions implemented in Algorithms [8]jT2| Assume 
the read/write operations between stages satisfy the "fresh 
pointer" condition. 

Let the function f t be getArrayPointer_P with input 
(A, £) and output p. Similarly, for stage t' > t, let f t ' be 
getArrayPointer_P with input (A,^') and output p' . 
Assume that [£',f] is a descendant of [£,t]. 

Consider a "fresh pointer" write operation to p[i]. Similarly, 
consider a "fresh pointer" read operation from p' [i] carried out 
after the "write" operation. Then, assuming no intermediate 
"write" operations of the above nature, the value written is 
the value read. 

A similar claim holds for getArrayPointer_C. 

Proof: With the observations made in the proof of 
Lemma [3] at hand, a simple induction on t is all that is needed. 

■ 

We end this section by noting that the function 



pathlndexlnactive given in Algorithm 13 is simply a 
shorthand, meant to help readability later on. 

B. Mid-level functions 



In this section we introduce Algorithms [14| and 15 



implementation of Algorithms [6] and [7] respectively, for the 
list decoding setting. 
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Algorithm 13: pathlndexlnactive(£) 



Lemma 2] holds. 



Input: path index I 

Output: true if path I is active, and false otherwise 

1 if activePath^] = true then 

2 | return false 

3 else 

4 |_ return true 



Algorithm 14: recursivelyCalcP(A, ip) 



list version 



Input: layer A and phase p 

1 if A = then return // Stopping condition 

2 set ip <— [ifi/2\ 

II Recurse first, if needed 

3 if <p mod 2 = then recursivelyCalcP(A — 1, ip) 
II Perform the calculation 

4 G i — 

5 for 1 = 0,1,..., L- 1 do 

6 if pathlndexlnactive(^) then 

7 | continue 

P\ <S— getArrayPointer_P(A, t) 
Px-i getArrayPointer_P(A — 1, £) 
C\ getArrayPointer_C(A, i) 
for P = 0, l,...,2 m " A - 1 do 
if ip mod 2 = then 

// apply Equation fib 

for u G {0, 1} do 

£ u „ ±Pa-i[2/3][u' u"\ ■ Px-i[2p + l][u"] 
a<-max(<T,P x [P][u']) 

II apply Equation (|5l 

set u'<-C7a[/?][0] 
for u" S {0, 1} do 
P*M[u")^ 

§P A _i [2/3] K e u") ■ Px-i[2/3 + 1] [u"\ 
a <- max (cr,P A [/?][«"]) 



9 
10 
11 
12 

13 
14 



16 
17 
18 
19 



else 



// normalize probabilities 
2i for £ = 0,1,..., L- 1 do 



22 
23 

24 
25 
26 
27 



if pathlndexlnactive(^) then 
|_ continue 

Pa getArrayPointer_P(A, t) 
for P = 0, l,...,2 m " A - 1 do 
for u G {0, 1} do 



One first notes that our new implementations loop 
over all path indices I. Thus, our new implementations 
make use of the functions getArrayPointer_P and 
getArrayPointer_C in order to assure that the con- 
sistency of calculations is preserved, despite multiple paths 
sharing information. In addition, Algorithm [6] contains code 
to normalize probabilities. The normalization is needed for a 
technical reason (to avoid floating-point underflow), and will 
be expanded on shortly. 

We start out by noting that the "fresh pointer" condition 
we have imposed on ourselves indeed holds. To see this, 
consider first Algorithm [14] The key point to note is that 
neither the killPath nor the clonePath function is called 
from inside the algorithm. The same observation holds for 



l,i) 



Algorithm 15: recursivelyUpdateC(A, ip) list version 

Input: layer A and phase ip 
Require : <p is odd 

l set C\ getArrayPointer_C(A,/) 
set Ca-i <— getArrayPointer_C(A 
set ip <s- [ip/2\ 
for £ = 0,1,..., L- 1 do 

if pathlndexlnactive(^) then 
| continue 

for p = 0, l,...,2 m " A - 1 do 
Ca-i[2/3][V> mod2]<-CA 



®Cx[P][l] 



Cx-i[2p + l][ip mod 2] ^ C X [P][1] 

10 if ip mod 2 = 1 then 

11 |_ recursivelyUpdateC(A — 1, ip) 



We now consider the normalization step carried out in 



lines 21 27 of Algorithm 14 Recall that a floating-point 



variable can not be used to hold arbitrarily small positive reals, 
and in a typical implementation, the result of a calculation that 
is "too small" will be rounded to 0. This scenario is called an 
"underflow". 

We now confess that all our previous implementations of 
SC decoders were prone to "underflow". To see this, consider 
line [T] in the outline implementation given in Algorithm [2] 
Denote by Y and U the random vectors corresponding to y 
and u, respectively. For b E {0, 1} we have that 



W^(y "- 1 ,ur 1 |6) = 



fn — 1 



y'o 



u 



(U 



U v = b) < 
U v = b) = 2" 



Recall that ip iterates from to n — 1. Thus, for codes having 
length greater than some small constant, the comparison in 
line [T] of Algorithm [2] ultimately becomes meaningless, since 
both probabilities are rounded to 0. The same holds for all of 
our previous implementations. 

Luckily, there is a simple fix to this problem. After the 
probabilities are calculated in lines [5[j20| of Algorithm 
normaliz^] the highest probability to be 1 in lines 



21 



14 



27 



we 



We claim that apart for avoiding underflows, normalization 
does not alter our algorithm. The following lemma formalizes 
this claim. 

Lemma 5: Assume that we are working with "perfect" 
floating-point numbers. That is, our floating-point variables are 
infinitely accurate and do not suffer from underflow/overflow. 
Next, consider a variant of Algorithm [14] termed Algo- 
rithm [14], in which just before line |2l| is first executed, 
the variable a is set to 1. That is, effectively, there is no 



normalization of probabilities in Algorithm 14 



Consider two runs, one of Algorithm 14 and one of Algo 



Algorithm 15 Thus, the "fresh pointer" condition is met, and extremely low. 



rithm 14 . In both runs, the input parameters to both algorithms 
are the same. Moreover, assume that in both runs, the state 

2 This correction does not assure us that underflows will not occur. However, 
now, the probability of a meaningless comparison due to underflow will be 
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of the auxiliary data structures is the same, apart for the 
following. 

Recall that our algorithm is recursive, and let Ao be the 
first value of the variable A for which line |5] is executed. That 
is, Ao is the layer in which (both) algorithms do not perform 
preliminary recursive calculations. Assume that when we are 
at this base stage, A = Ao, the following holds: the values read 
from P\-i in lines 15 and 20 in the run of Algorithm 14 are 
a multiple by a^-i of the corresponding values read in the 



run of Algorithm 14 . Then, for every A > Ao, there exist a 



constant a\ such that the values written to P\ in lineETlin the 



run of Algorithm 14 are a multiple by a\ of the corresponding 
values written by Algorithm [l4j . 

Proof: For the base case A = Ao we have by inspection 
that the constant a\ is simply (aA-i) 2 , divided by the value of 



a after the main loop has finished executing in Algorithm 14 
The claim for a general A follows by induction. ■ 

C. High-level functions 

We now turn our attention to the high-level functions of 
our algorithm. Consider the topmost function, the main loop 
given in Algorithm 16 We start by noting that by lines [T] 
and [2] we have that condition "initialized" in Definition [T] is 
satisfied. Also, for the inductive basis, we have that condition 
"balanced" holds for t = 1 at the end of line [2] Next, notice 
that lines [3}|5] are in-line with our "fresh pointer" condition. 

The main loop, lines [6-13 is the analog of the main loop 
in Algorithm [5] After the main loop has finished, we pick (in 
lines 14 T6| the most likely codeword from our list and return 
it. 



Algorithm 16: SCL decoder, main loop 

Input: the received vector y and a list size L as a global 
Output: a decoded codeword c 

// Initialization 
l initializeDataStructuresQ 
£ <— assignlnitialPathQ 
Pq <S— getArrayPointer_P(0,£) 

for ft — 0, 1, n — 1 do 

L set P \p][0] <- W(yp\0), Po[/3][l] <- W(yp\l) 

II Main loop 
6 for <p = 0, 1, . . . , n — 1 do 

recursivelyCalcP(m, ip) 
if u v is frozen then 

| continuePaths_FrozenBit(y) 
else 

| continuePaths_Unf rozenBit(ip) 

if tp mod 2 = 1 then 

|_ recursivelyUpdateC (m, <p) 



II Return the best codeword in the list 

14 £ <s— f indMostProbablePathQ 

15 set Co <— getArrayPointer_C(0, £) 

16 return c = (C 

We now expand on Algorithms [17] and [T~8] Algorithm [17] 
is straightforward: it is the analog of line |6fin Algorithm [5] 
applied to all active paths. 

Algorithm 18 is the analog of lines [8 11 in Algorithm [5] 
However, now, instead of choosing the most likely fork out of 



Algorithm 17: continuePaths_FrozenBit(<^) 
Input: phase ip 

1 for £ = 0,1,..., L- 1 do 

2 if pathlndexlnactive(£) then continue 

3 C m <— getArrayPointer_C(m, £) 

4 set C m [0][y mod 2] to the frozen value of u ¥ 



Algorithm 18: continuePathsJLJnfrozenBit^) 
Input: phase <p 

1 probForks <s— new 2-D float array of size L x 2 

2 i(-0 

// populate probForks 

3 for £ = 0,1,..., L- 1 do 

4 if pathlndexlnactive(£) then 
s probForks [l][0] < 1 

6 probForks [£][1] < 1 

7 else 

8 Pm getArrayPointer_P(m, £) 

9 probForks [£][0] <- P m [0][0] 

10 probForks [£][1] <r- P m [0][l] 

11 i -k— i + 1 



12 pf- min(2i, L) 

13 COntForks new 2-D boolean array of size I x 2 

// The following is possible in 0{L) time 

14 populate contForks such that contForks[£] [6] is true iff 
probForks [£][b] is one of the p largest entries in probForks 
(and ties are broken arbitrarily) 

// First, kill-off non-continuing paths 
is for £ = 0,1,..., L- 1 do 

if pathlndexlnactive(£) then 
|_ continue 

if contForks [£}[0] = false and contForks[£][l] = false 
then 

|_ killPath(f) 



19 



// Then, continue relevant paths, and 
duplicate if necessary 
20 for £ = 0,1,..., L- 1 do 



21 



23 
24 

25 
26 
27 
28 
29 
30 
31 
32 
33 



if contForks [£][0] = false and contForks[£][l] = false 

then / / both forks are bad, or invalid 
I continue 

C m 4— getArrayPointer_C(m, £) 
if contForks[£][0] = true and contForks [£][1] = true then 
/ / both forks are good 
set Cm [0] [<p mod 2] <- 
£' <r- clonePath(f) 
C m getArrayPointer_C(m, £') 
set C m [0][ip mod 2] <- 1 
else// exactly one fork is good 
if contForks[£][0] = true then 
| set C m [0][v3 mod 2] <- 
else 

|_ set Cm [0] [<p mod 2] <r- 1 
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2 possible forks, we must typically choose the L most likely 
forks out of 2L possible forks. The most interesting line is 



14 



in which the best p forks are marked. Surprisingly^] this 
can be done in O(L) time iflOl Section 9.3]. After the forks 
are marked, we first kill the paths for which both forks are 
discontinued, and then continue paths for which one or both 
are the forks are marked. In case of the latter, the path is first 
split. Note that we must first kill paths and only then split paths 



in order for the "balanced" constraint (13 1 to hold. Namely 



this way, we will not have more than L active paths at a time. 



The point of Algorithm 18 is to prune our list and leave only 
the L "best" paths. This is indeed achieved, in the following 
sense. At stage ip we would like to rank each path according 
the the probability 

By |9]l and ( fTT| ), this would indeed by the case if our floating 
point variables were "perfect", and the normalization step 
in lines [2T -27 of Algorithm 14 were not carried out. 
Lemma |51 



By 

we see that this is still the case if normalization 
is carried out. 

The last algorithm we consider in this section is Algo- 



rithm 19 In it, the most probable path is selected from the 
final list. As before, by (|9|-([T2| and Lemma [5] the value of 

P w [0][C m [0][l]] is simply 



1 



up to a normalization constant. 



Algorithm 19: findMostProbablePathQ 
Output: the index £' of the most probable path 

i e <- o, p' <- o 

tori = 0,1,...,L- 1 do 

if pathlndexlnactive(^) then 
|_ continue 

C m 4— getArrayPointer_C(m, £) 
Pm getArrayPointer_P(m,^) 
if p' < P m [0][C m [0][l]] then 
|_ g <-£,p' <- P m [0][C m [0][l]] 



9 return £' 



We now prove our two main result. 

Theorem 6: The space complexity of the SCL decoder is 
0(L ■ n). 

Proof: All the data-structures of our list decoder are 
allocated in Algorithm [8] and it can be checked that the total 
space used by them is 0(L ■ n). Apart from these, the space 
complexity needed in order to perform the selection operation 
in line 14 of Algorithm [T8] is O(L). Lastly, the various 



local variables needed by the algorithm take O(l) space, and 
the stack needed in order to implement the recursion takes 
O(logn) space. ■ 
Theorem 7: The running time of the SCL decoder is 0(L ■ 
n log n). 

3 The O(L) time result is rather theoretical. Since L is typically a small 
number, the fastest way to achieve our selection goal would be through simple 
sorting. 



Proof: Recall that by our notation m = logn. The 
following bottom-to-top table summarizes the running time 
of each function. The notation will be explained shortly. 



function 



running time 



initializeDataStructuresQ 
assign InitialP at h() 
clonePath(f) 
killPath(f) 

getArrayPointer_P(A, £) 
getArrayPointer_C(A, £) 
pathlndexlnactive(f) 



0(L ■ m) 

0(m) 

0(m) 

0(m) 

0(2 m - x ) 

0(2 m - x ) 

0(1) 



recursivelyCalcP(r7i, ■) 0^(L ■ m ■ n) 

recursivelyUpdateC(m, •) Os(L ■ m ■ n) 

continuePaths_FrozenBit(t^) 0{L) 

continuePaths_FrozenBit(<y9) 0{L ■ m) 

f indMostProbablePath O(L) 

SCL decoder 0(L ■ m ■ n) 

The first 7 functions in the table, the low-level func- 
tions, are easily checked to have the stated running time. 
Note that the running time of getArrayPointer_P and 
getArrayPointer_C is due to the copy operation in line|6] 
of Algorithm [6] applied to an array of size 0(2 m ~ A ). Thus, 
as was previously mentioned, reducing the size of our arrays 
has helped us reduce the running time of our list decoding 
algorithm. 

Next, let us consider the two mid-level functions, namely, 

recursivelyCalcP and recursivelyUpdateC. The 
notation 

recursivelyCalcP(m, •) € 0^.{L ■ m ■ n) 
means that total running time of the n function calls 

recursivelyCalcP(m, tp) , < tp < 2 m 

is 0(L ■ m ■ n). To see this, denote by /(A) the total running 
time of the above with m replaced by A. By splitting the 
running time of Algorithm [14] into a non-recursive part and a 
recursive part, we have that for A > 

2 A • 0(L ■ 2 m ~ A ^ 



/(A) 

Thus, it easily follows that 

/ (m) G 0(L ■ m ■ 2 r 



K ) + /(A-1) 



) = 0(L ■ m ■ n) 



In essentially the same way, we can prove that the total running 
time of the recursivelyUpdateC(m, ip) over all 2" _1 
valid (odd) values of ip is 0(m ■ n). Note that the two mid- 
level functions are invoked in lines [7] and 13 of Algorithm 16 
on all valid inputs. 

The running time of the high-level functions is easily 
checked to agree with the table. 



V. Modified polar codes 

The plots in Figure [5] were obtained by simulation. The 
performance of our decoder for various list sizes is given by 
the solid lines in the figure. As expected, we see that as the 
list size L increases, the performance of our decoder improves. 
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Fig. 5. Word error rate of a length n = 2048 (top) and n = 8192 (bottom) 
rate 1/2 polar code optimized for SNR=2 dB under various list sizes. Code 
construction was carried out via the method proposed in (4). 



We also notice a diminishing-returns phenomenon in terms of 
increasing the list size. The reason for this turns out to be 
simple. 

The dashed line, termed the "ML bound" was obtained as 
follows. During our simulations for L = 32, each time a 
decoding failure occurred, we checked whether the decoded 
codeword was more likely than the transmitted codeword. That 
is, whether VK(y |c) > VK(y|c). If so, then the optimal ML 
decoder would surely misdecode y as well. The dashed line 
records the frequency of the above event, and is thus a lower- 
bound on the error probability of the ML decoder. Thus, for 
an SNR value greater than about 1.5 dB, Figure [T] suggests 
that we have an essentially optimal decoder when L = 32. 

Can we do even better? At first, the answer seems to be an 
obvious "no", at least for the region in which our decoder is 
essentially optimal. However, it turns out that if we are willing 
to accept a small change in our definition of a polar code, we 
can dramatically improve performance. 

During simulations we noticed that often, when a decoding 
error occurred, the path corresponding to the transmitted 
codeword was a member of the final list. However, since there 
was a more likely path in the list, the codeword corresponding 
to that path was returned, which resulted in a decoding error. 
Thus, if only we had a "genie" to tell as at the final stage which 
path to pick from our list, we could improve the performance 
of our decoder. 

Luckily, such a genie is easy to implement. Recall that we 
have k unfrozen bits that we are free to set. Instead of setting 
all of them to information bits we wish to transmit, we employ 
the following simple concatenation scheme. For some small 
constant r, we set the first k — r unfrozen bits to information 
bits. The last r unfrozen bits will hold the r-bit CRC ifTTl 



Section 8.8] valuqjof the first k — r unfrozen bits. Note this 
new encoding is a slight variation of our polar coding scheme. 
Also, note that we incur a penalty in rate, since the rate of 
our code is now (k — r)/n instead of the previous k/n. 

What we have gained is an approximation to a genie: at 
the final stage of decoding, instead of calling the function 
f indMostProbablePath in Algorithm [19] we can do 
the following. A path for which the CRC is invalid can not 
correspond to the transmitted codeword. Thus, we refine our 
selection as follows. If at least one path has a correct CRC, 
then we remove from our list all paths having incorrect CRC 
and then choose the most likely path. Otherwise, we select the 
most likely path in the hope of reducing the number of bits 
in error, but with the knowledge that we have at least one bit 
in error. 

Figures [T] and [2] contain a comparison of decoding per- 
formance between the original polar codes and the slightly 
tweaked version presented in this section. A further im- 
provement in bit-error-rate (but not in block-error-rate) is 
attained when the decoding is performed systematically |12|. 
The application of systematic polar-coding to a list decoding 
setting is attributed to 0131 . 
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4 A binary linear code having a conesponding k X r parity-check matrix 
constructed as follows will do just as well. Let the the first k — r columns 
be chosen at random and the last r columns be equal to the identity matrix. 



